| Variable | N = 2151 |
|---|---|
| Age | |
| Mean, Median (IQR) | 52, 54 (45, 62) |
| Range | 11, 91 |
| SD | 13 |
| BMI | |
| Mean, Median (IQR) | 29.1, 28.6 (26.1, 31.2) |
| Range | 19.3, 51.3 |
| SD | 4.8 |
| Sex | |
| M | 75, 35% 0 |
| F | 140, 65% 0 |
| CVD | |
| No CVD | 197, 92% 0 |
| CVD | 18, 8.4% 0 |
| Diabetes | |
| No Diabetes | 139, 65% 0 |
| Diabetes | 76, 35% 0 |
| Metabolic_Syndrome | |
| No syndrome | 150, 70% 0 |
| Syndrome | 65, 30% 0 |
| Liver_stage | |
| Non significant | 176, 82% 0 |
| Significant | 39, 18% 0 |
| Hypertension | |
| No Hypertension | 147, 68% 0 |
| Hypertension | 68, 32% 0 |
| 1 n, % N missing | |
Logistic Regression Analysis of Risk Factors for Liver Fibrosis Progression in NASH Patients
1 Introduction
Nonalcoholic Steatohepatitis (NASH) is the most severe form of nonalcoholic fatty liver disease (NAFLD), a condition in which the liver builds up excessive fat deposits. It occurs when the fat buildup causes inflammation and damage. NASH often has no outward signs or symptoms, so it is considered to be underdiagnosed, but the most common symptoms are fever and mild pain in the upper right abdomen. Risk factors for NASH include being overweight or obese and having certain clinical conditions like diabetes, metabolic syndrome, sleep apnea, polycystic ovary syndrome, and underactive thyroid. There are various available technologies for diagnosing NASH, and diagnosis begins with a physical exam and clinical history review; if the condition is suspected, the patient then undergoes a liver biopsy (n.d.).
The main complication of NASH is severe liver scarring or cirrhosis. Cirrhosis occurs due to liver injury, such as the damage caused by inflammation in NASH. The liver creates areas of scarring (fibrosis) while attempting to stop the inflammation (Clinic, 2021). Fibrosis occurs in different stages and can be assessed using multiple scales. If caused by NASH, the degree of liver damage can be evaluated with a liver biopsy or histology, and liver fibrosis can be staged based on the METAVIR scoring system, which assigns a score based on inflammation (activity) and damage (fibrosis). The fibrosis stages range from F0 to F4, where F0 indicates no fibrosis, and F4 indicates cirrhosis (Nall & Cherney, 2023).
Studies have shown that the risk of progression to liver cirrhosis in NASH patients is 10-25%, and this also depends on the ethnic origin of the patients, as Hispanic Americans of Mexican origin have a greater predisposition to NASH development. Mexico is one of the countries with the highest prevalence of metabolic disease hence a multicenter retrospective cross-sectional study was conducted from January 2012 to December 2017 to investigate the main metabolic factors involved in the progression to advanced fibrosis in Mexican patients with NASH[Méndez-Sánchez et al. (2020)](Dongiovanni et al., 2015) [TELI (1995)](Younossi et al., 2016) (Romero-Martínez et al., 2019).
Project Goal
The goal of this project is to examine the associations between the progression of liver fibrosis and various clinical conditions by analyzing the clinical records of a cohort of Mexican patients diagnosed with nonalcoholic steatohepatitis (NASH).
Scientific Objectives
The analysis will focus on the following scientific objectives:
1. Explore the relationship between the progression of liver fibrosis and cardiovascular disease while accounting for age, sex, and BMI.
2. Investigate the relationship between the progression of liver fibrosis and metabolic syndrome accounting for age, sex, and BMI.
3. Analyze the association between the progression of liver fibrosis and type 2 diabetes mellitus accounting for age, sex, and BMI.
4. Examine the association between the progression of liver fibrosis and systemic arterial hypertension while accounting for age, sex, and BMI.
2 Data Description
The dataset used for this project was obtained from a multicenter retrospective cross-sectional study conducted from January 2012 to December 2017. The study aimed to investigate the impact of various clinical conditions on the progression to advanced fibrosis in Mexican patients with NASH. It includes information from 215 enrolled patients with biopsy-proven NASH and fibrosis. NASH diagnosis was based on the NAS score, and liver fibrosis was staged according to the Kleiner scoring system. The dataset comprises 31 variables obtained through a review of clinical records, encompassing:
Demographic information (e.g., age, gender)
Anthropometric measurements (e.g., BMI, waist circumference)
Clinical diagnoses (e.g., metabolic syndrome, type 2 diabetes mellitus, systemic arterial hypertension)
Laboratory parameters (e.g., liver function tests, lipid profile, glucose levels)
Histological findings from liver biopsies
3 Methods
Data Preparation
The dataset was tidied to ensure both completeness and accuracy. Key steps included creating a new outcome variable, Liver_Stage, to categorize patients based on their fibrosis stage. Specifically, patients with stage F0-F2 fibrosis were grouped under non-significant liver fibrosis, while those with stage F3-F4 fibrosis were classified under significant fibrosis. Additionally, data types were converted as needed, and only the variables essential for the analyses were selected.
Exploratory Data Analysis
Exploratory data analysis was conducted to gain insights into the distribution of variables and the relationships between the outcome and predictors of interest. This process was implemented in two stages:
Descriptive Statistics: A summary table was created to display various summary measures of the variables used in the analysis.
Data Visualization: Visual exploration of the relationships between variables was performed, including the creation of side-by-side bar plots to examine the association between the outcome and various predictors.
Statistical Analysis
The goal of this project was to analyze the associations between the progression of liver disease and various clinical conditions, including cardiovascular disease, metabolic syndrome, type 2 diabetes mellitus, and systemic arterial hypertension, within a cohort of Mexican patients with NASH. To analyze these associations, we employed binary logistic regression models, fitting the following models:
Model 1: \(Log(odds(Y_{i}))=\beta_0 + \beta_1I_{CVD=CVD} + \beta_2Age + \beta_3I_{Sex=F} +\beta_4BMI\)
Model 2: \(Log(odds(Y_{i}))=\beta_0 + \beta_1I_{MetabolicSyndrome=Syndrome} + \beta_2Age + \beta_3I_{Sex=F} + \beta_4BMI\)
Model 3: \(Log(odds(Y_{i}))=\beta_0 + \beta_1I_{Diabetes=Diabetes} + \beta_2Age + \beta_3I_{Sex=F} + \beta_4BMI\)
Model 4: \(Log(odds(Y_{i}))=\beta_0 + \beta_1I_{Hypertension=Hypertension} + \beta_2Age + \beta_3I_{Sex=F} +\beta_4BMI\)
Where:
- \(Y_i\): This represents the outcome of interest, which is a binary variable indicating whether there was significant progression of liver fibrosis or not.
- \(\beta_1\): This is the parameter of interest. It represents the difference in the log odds for patients with the same age, sex, and BMI value compared to the reference level. This parameter will be interpreted in the odds ratio scale as \(exp(\beta_1)\).
- The parameters are unknown but will be estimated using the method of maximum likelihood.
- Statistical evidence will be assessed by testing the following hypotheses:
- \(H_0: \beta_1 = 0\) (Null Hypothesis: There is no difference in the log-odds of liver fibrosis progression when comparing the predictors of interest to their reference level, adjusting for age, sex, and BMI).
- \(H_1: \beta_1 \ne 0\) (Alternative Hypothesis: There is a difference in the log-odds of liver fibrosis progression when comparing the predictors of interest to their reference level, adjusting for age, sex, and BMI).
- To prevent overfitting and ensure generalizability, we adhered to the rule of thumb of having at least \(n * p_{min}/15\) predictors in the model.
- Forest plots were utilized to visually represent the results of the logistic regression models.
Model Assumptions
In order to ensure the validity and proper interpretation of coefficient estimates, p-values, and confidence intervals, it is important that several assumptions of the models are satisfied. These assumptions include:
Linearity in the Logit: This assumes that continuous predictors exhibit a linear relationship with the log-odds of the outcome. This was assessed using a Component-Residual (CR) plot.
Binary Outcome: This assumes that the outcome variable has only two possible values.
In addition to these assumptions, several other important considerations for logistic regression models include:
Absence of Multicollinearity: This assumes that the independent variables are not highly correlated with each other. Multicollinearity was assessed using the variance inflation factor (VIF).
Absence of Outliers: This assumes that there are no individual observations with very large deviance residuals. These were diagnosed using statistical tests.
Absence of Influential Observations: This assumes that no individual observations excessively influence the regression coefficients. Influential observations were identified using the influence index plot.
Methodology: AI-Assisted Writing
This report’s text was reviewed and refined using several AI-powered tools: Grammarly for grammar and style checking, and Claude and ChatGPT for general writing advice and suggestions. While these AI assistants were used to enhance clarity and correctness, all core ideas, analyses, and conclusions are the author’s own.
4 Results
Exploratory Data Analysis
Descriptive Statistics
Data Visualizations
Age
BMI
Liver Stage and CVD
Liver Stage and Diabetes
Liver Stage and Metabolic Syndrome
Liver Stage and systemic arterial hypertension
Statistical Analysis
Fitted Models
| Liver stage | |||
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.00 | 0.00 – 0.03 | <0.001 |
| CVD [CVD] | 2.08 | 0.65 – 6.08 | 0.193 |
| Age | 1.05 | 1.02 – 1.09 | 0.003 |
| Sex [F] | 1.67 | 0.76 – 3.93 | 0.216 |
| BMI | 1.07 | 0.99 – 1.15 | 0.087 |
| Observations | 215 | ||
| R2 Tjur | 0.070 | ||
| Liver stage | |||
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.00 | 0.00 – 0.05 | <0.001 |
| Diabetes [Diabetes] | 2.83 | 1.35 – 6.05 | 0.006 |
| Age | 1.05 | 1.02 – 1.09 | 0.004 |
| Sex [F] | 1.47 | 0.67 – 3.45 | 0.354 |
| BMI | 1.05 | 0.97 – 1.13 | 0.223 |
| Observations | 215 | ||
| R2 Tjur | 0.102 | ||
| Liver stage | |||
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.00 | 0.00 – 0.04 | <0.001 |
| Metabolic Syndrome [Syndrome] |
2.59 | 1.21 – 5.60 | 0.014 |
| Age | 1.06 | 1.02 – 1.09 | 0.002 |
| Sex [F] | 1.54 | 0.70 – 3.61 | 0.295 |
| BMI | 1.05 | 0.97 – 1.13 | 0.241 |
| Observations | 215 | ||
| R2 Tjur | 0.092 | ||
| Liver stage | |||
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.00 | 0.00 – 0.08 | 0.001 |
| Hypertension [ Hypertension] |
2.68 | 1.25 – 5.82 | 0.011 |
| Age | 1.04 | 1.01 – 1.08 | 0.014 |
| Sex [F] | 1.59 | 0.72 – 3.71 | 0.263 |
| BMI | 1.05 | 0.96 – 1.13 | 0.261 |
| Observations | 215 | ||
| R2 Tjur | 0.093 | ||
Forest Plots
Model 1
Model 2
Model 3
Model 4
5 Discussion
The objective of this study was to examine the association between advanced liver disease progression and conditions such as cardiovascular disease, metabolic syndrome, type 2 diabetes mellitus, and systemic arterial hypertension. We utilized data from a multicenter retrospective cross-sectional study conducted from January 2012 to December 2017, which aimed to investigate the impact of various clinical conditions on the progression to advanced fibrosis in Mexican patients with non-alcoholic steatohepatitis (NASH).
Descriptive statistics revealed that patients ranged in Age from 11 to 91 years (Mean = 52, SD = 13), with a body mass index (BMI) ranging from 19.3 to 51.3 (Mean = 26.1, SD = 4.8). The majority of patients were female (65%), non-diabetic (65%), without metabolic syndrome (70%), without hypertension (68%), without cardiovascular disease (CVD) (92%), and in the non-significant liver fibrosis group (82%).
We generated side-by-side bar graphs to visualize the distribution of liver fibrosis progression across different conditions. The graphs depicted the percentages of patients with significant liver fibrosis for each condition:
Diabetes:
17% of patients without diabetes had significant liver fibrosis.
30% of patients with diabetes had significant liver fibrosis.
Metabolic Syndrome:
13% of patients without metabolic syndrome had significant liver fibrosis.
29% of patients with metabolic syndrome had significant liver fibrosis.
systemic arterial hypertension:
12% of patients without hypertension had significant liver fibrosis.
32% of patients with hypertension had significant liver fibrosis.
CVD:
17% of patients without CVD had significant liver fibrosis.
33% of patients with CVD had significant liver fibrosis.
Multiple binary logistic regression models were fitted to investigate the association between the odds of significant liver fibrosis and the previously mentioned conditions in Mexican patients with NASH. After adjusting for Age, sex, and BMI, our findings revealed:
Diabetes was significantly positively associated with significant liver fibrosis (Adjusted Odds Ratio [AOR] = 2.83; 95% Confidence Interval [CI]: 1.35-6.05; p < 0.05). Patients with diabetes had 2.83 times the odds of significant liver fibrosis.
Metabolic syndrome was significantly positively associated with significant liver fibrosis (AOR = 2.59; 95% CI: 1.21-5.60; p < 0.05). Patients with metabolic syndrome had 2.59 times the odds of significant liver fibrosis.
Systemic arterial hypertension was significantly positively associated with significant liver fibrosis (AOR = 2.68; 95% CI: 1.25-5.82; p < 0.05). Patients with systemic arterial hypertension had 2.68 times the odds of significant liver fibrosis.
We conducted diagnostics to assess potential violations of model assumptions. For each model, we observed that:
The absence of multicollinearity assumption was satisfied.
The linearity assumption was violated for the Age variable.
No outliers or influential observations were detected.
To address the violation of the linearity assumption, we included a quadratic term for Age in each model. While this led to changes in the parameter estimates, we continued to observe significantly positive associations between diabetes, metabolic syndrome, hypertension, and significant liver fibrosis. We tested the robustness of the models against all assumptions, and the newly fitted models did not violate these assumptions. In addition, we performed an Analysis of Variance to assess whether including a quadratic term improved our models’ ability to explain the variability in the outcome, and we obtained significant findings.
However, there are some limitations to the study. The study was based on a multicenter study involving seven hospitals, which may introduce clustering effects that were not accounted for in the standard logistic regression. Despite this limitation, our study offered valuable insights into the associations between various clinical conditions and advanced liver disease progression in Mexican patients with NASH. The consistent positive associations emphasize the importance of comprehensive metabolic health management in NASH patients. Future prospective studies with larger samples could further validate and extend these findings.
6 Appendix
Multicollinearity
The following table presents the Variance Inflation Factors (VIF) for the models:
CVD Age Sex BMI
1.023168 1.044275 1.048227 1.058761
Diabetes Age Sex BMI
1.033449 1.043600 1.033325 1.089989
Metabolic_Syndrome Age Sex BMI
1.071288 1.054968 1.030856 1.104333
Hypertension Age Sex BMI
1.089462 1.096016 1.026487 1.109904
Influence Index Plots
Influence index plots for all models are displayed in the layout below:
Model 1
Model 2
Model 3
Model 4
CR Plots for Linearity
Model 1
Model 2
Model 3
Model 4
Outliers
The outlier tests for each model are summarized below:
No Studentized residuals with Bonferroni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferroni p
134 2.243348 0.024874 NA
No Studentized residuals with Bonferroni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferroni p
4 2.282593 0.022454 NA
No Studentized residuals with Bonferroni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferroni p
134 2.34063 0.019251 NA
No Studentized residuals with Bonferroni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferroni p
56 2.247651 0.024598 NA
Sensitivity Analysis
Fitted Models
| Liver stage | |||
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.00 | 0.00 – 0.00 | 0.005 |
| CVD [CVD] | 2.00 | 0.62 – 5.96 | 0.225 |
| Age | 1.56 | 1.11 – 2.53 | 0.034 |
| Age^2 | 1.00 | 0.99 – 1.00 | 0.055 |
| Sex [F] | 1.37 | 0.61 – 3.26 | 0.464 |
| BMI | 1.06 | 0.98 – 1.14 | 0.135 |
| Observations | 215 | ||
| R2 Tjur | 0.085 | ||
| Liver stage | |||
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.00 | 0.00 – 0.00 | 0.007 |
| Diabetes [Diabetes] | 2.69 | 1.28 – 5.77 | 0.009 |
| Age | 1.54 | 1.10 – 2.50 | 0.042 |
| Age^2 | 1.00 | 0.99 – 1.00 | 0.066 |
| Sex [F] | 1.23 | 0.54 – 2.93 | 0.629 |
| BMI | 1.05 | 0.96 – 1.13 | 0.269 |
| Observations | 215 | ||
| R2 Tjur | 0.111 | ||
| Liver stage | |||
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.00 | 0.00 – 0.00 | 0.006 |
| Metabolic Syndrome [Syndrome] |
2.41 | 1.12 – 5.19 | 0.024 |
| Age | 1.53 | 1.10 – 2.48 | 0.042 |
| Age^2 | 1.00 | 0.99 – 1.00 | 0.070 |
| Sex [F] | 1.31 | 0.58 – 3.10 | 0.526 |
| BMI | 1.04 | 0.96 – 1.13 | 0.291 |
| Observations | 215 | ||
| R2 Tjur | 0.105 | ||
| Liver stage | |||
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.00 | 0.00 – 0.00 | 0.007 |
| Hypertension [ Hypertension] |
2.75 | 1.27 – 6.06 | 0.011 |
| Age | 1.56 | 1.11 – 2.55 | 0.035 |
| Age^2 | 1.00 | 0.99 – 1.00 | 0.049 |
| Sex [F] | 1.37 | 0.61 – 3.27 | 0.454 |
| BMI | 1.04 | 0.96 – 1.13 | 0.350 |
| Observations | 215 | ||
| R2 Tjur | 0.111 | ||
Diagnostic Checking
Multicollinearity
The following table presents the Variance Inflation Factors (VIF) for the models:
CVD Age I(Age^2) Sex BMI
1.035541 96.591303 96.905254 1.074820 1.069808
Diabetes Age I(Age^2) Sex BMI
1.026118 97.161832 97.378446 1.056633 1.090481
Metabolic_Syndrome Age I(Age^2) Sex
1.059507 94.502842 94.828262 1.047213
BMI
1.105071
Hypertension Age I(Age^2) Sex BMI
1.110465 95.873987 96.183794 1.043001 1.110759
Influence Index Plots
Influence index plots for all models are displayed in the layout below:
Model 1
Model 2
Model 3
Model 4
CR plots for linearity
Model 1
Model 2
Model 3
Model 4
Outliers
The outlier tests for each model are summarized below:
No Studentized residuals with Bonferroni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferroni p
56 2.340903 0.019237 NA
No Studentized residuals with Bonferroni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferroni p
82 2.333315 0.019632 NA
No Studentized residuals with Bonferroni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferroni p
82 2.307284 0.021039 NA
No Studentized residuals with Bonferroni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferroni p
56 2.411397 0.015892 NA
Anova Tests
Analysis of Deviance Table
Model 1: Liver_stage ~ CVD + Age + Sex + BMI
Model 2: Liver_stage ~ CVD + Age + I(Age^2) + Sex + BMI
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 210 187.71
2 209 181.74 1 5.972 0.01453 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Deviance Table
Model 1: Liver_stage ~ Diabetes + Age + Sex + BMI
Model 2: Liver_stage ~ Diabetes + Age + I(Age^2) + Sex + BMI
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 210 181.72
2 209 176.31 1 5.411 0.02001 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Deviance Table
Model 1: Liver_stage ~ Metabolic_Syndrome + Age + Sex + BMI
Model 2: Liver_stage ~ Metabolic_Syndrome + Age + I(Age^2) + Sex + BMI
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 210 183.33
2 209 178.10 1 5.2316 0.02218 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Deviance Table
Model 1: Liver_stage ~ Hypertension + Age + Sex + BMI
Model 2: Liver_stage ~ Hypertension + Age + I(Age^2) + Sex + BMI
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 210 182.84
2 209 176.53 1 6.3195 0.01194 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1